Pseudo-convergent Q-Learning by Competitive Pricebots

نویسندگان

  • Jeffrey O. Kephart
  • Gerald Tesauro
چکیده

We study novel aspects of multi-agent Q-learning in a model market in which two identical, competing \pricebots" strategically price a commodity. Two fundamentally diierent solutions are observed: an exact, stationary solution with zero Bellman error consisting of symmetric policies, and a non-stationary, broken-symmetry pseudo-solution, with small but non-zero Bellman error. This \pseudo-convergent" asymmet-ric solution has no analog in ordinary Q-learning. We calculate analytically the form of both solutions, and map out numerically the conditions under which each occurs. We suggest that this observed behavior will also be found more generally in other studies of multi-agent Q-learning, and discuss implications and directions for future research.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Coco-Q: Learning in Stochastic Games with Side Payments

Coco (“cooperative/competitive”) values are a solution concept for two-player normalform games with transferable utility, when binding agreements and side payments between players are possible. In this paper, we show that coco values can also be defined for stochastic games and can be learned using a simple variant of Q-learning that is provably convergent. We provide a set of examples showing ...

متن کامل

Shopbots and Pricebots

Shopbots are software agents that automatically gather and collate information from multiple on-line vendors about the price and quality of consumer goods and services. Rapidly increasing in number and sophistication, shopbots are helping more and more buyers minimize expenditure and maximize satisfaction. In response to this trend, it is anticipated that sellers will come to rely on pricebots,...

متن کامل

An Online Convergent Q-learning Algorithm with Linear Function Approximation

We present in this article a variant of Q-learning with linear function approximation that is based on two-timescale stochastic approximation. Whereas it is difficult to prove convergence of regular Q-learning with linear function approximation because of the off-policy problem, we prove that our algorithm is convergent. Numerical results on a multi-stage stochastic shortest path problem show t...

متن کامل

Joint Action Learners in Competitive Stochastic Games

This thesis investigates the design of adaptive utility maximizing software agents for competitive multi-agent settings. The focus is on evaluating the theoretical and empirical performance of Joint Action Learners (JALs) in settings modeled as stochastic games. JALs extend the well-studied Q-learning algorithm. A previously introduced JAL optimizes with respect to stationary or convergent oppo...

متن کامل

ساخت و اعتباریابی مقیاسی برای سنجش فرایند یادگیری سازمانی

Organizational learning is the process that updates and changes organizational shared mental models that in turn results in competitive advantage, profitability growth and ultimately organizational performance development by acquiring data, using information, creating and institutionalizing knowledge within organization. In the other words, organizational learning essentially aims the organizat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000